Development and assessment of novel machine learning models to predict the probability of postoperative nausea and vomiting for patient-controlled analgesia

Postoperative nausea and vomiting (PONV) can lead to various postoperative complications. The risk assessment model of PONV is helpful in guiding treatment and reducing the incidence of PONV, whereas the published models of PONV do not have a high accuracy rate. This study aimed to collect data from patients in Sichuan Provincial People’s Hospital to develop models for predicting PONV based on machine learning algorithms, and to evaluate the predictive performance of the models using the area under the receiver characteristic curve (AUC), accuracy, precision, recall rate, F1 value and area under the precision-recall curve (AUPRC). The AUC (0.947) of our best machine learning model was significantly higher than that of the past models. The best of these models was used for external validation on patients from Chengdu First People’s Hospital, and the AUC was 0.821. The contributions of variables were also interpreted using SHapley Additive ExPlanation (SHAP). A history of motion sickness and/or PONV, sex, weight, history of surgery, infusion volume, intraoperative urine volume, age, BMI, height, and PCA_3.0 were the top ten most important variables for the model. The machine learning models of PONV provided a good preoperative prediction of PONV for intravenous patient-controlled analgesia.


Model establishment.
A total of 54 prediction models were established by nine machine learning algorithms, three variable selection methods, two data sampling methods, and random forest imputing methods. Samples from the test set were used to evaluate the impact of different data processing methods or machine learning algorithms on model predictive performance. The results showed that differences in model predictive performance exist by different data filling, data sampling, variable selection and machine learning algorithms (Tables 1, 2).
Model evaluation. The AUC, accuracy, precision, recall rate, F1 value, and AUPRC were used to evaluate the performance of the models. The AUCs of the five best models were 0.5995, 0.6501, 0.9107, 0.9444, and 0.9469. The model using the Lasso screening method, BSMOTE and CatBoost algorithms had the best performance (AUC = 0.9469). We used the best model in patients from Chengdu First People's Hospital for external validation, and the AUC also reached 0.8211 (Table 3). To determine the clinical usefulness of the model by quantifying the net benefits, decision curve analyses for this prediction model were performed. The characteristic curve of the five best models is shown in Fig. 1.

Model interpretation.
We used the SHapley Additive ExPlanation (SHAP) value to explain the contribution of the variables to the model. SHAP estimated the contribution of each feature value in each sample to the prediction in Fig. 2A. No history of motion sickness and/or PONV, male, no history of surgery, no dexmedetomidine use, no ephedrine use and young age provided a negative contribution. The number of possible combinations of variables increased exponentially as the number of variables increased, and the selection order of combined variables affected the SHAP value. The SHAP value of the top 10 combination variables is shown in Fig. 2B. The results showed that history of motion sickness and/or PONV, sex, weight, history of surgery, infusion volume, intraoperative urine volume, age, BMI, height, and PCA_3.0 were the top ten most important variables for the model. In our study, the analgesic pump formulation had three options: the first formulation was sufentanil, the second formulation was hydromorphone, and the third formulation was sufentanil and nonsteroidal anti-inflammatory drugs (NSAIDs). PCA_3.0 is the third option for the formulation of analgesic pumps. For the prediction model, the higher the SHAP value of a variable was the more likely PONV. The redder the color of the variable were the larger the value, and the bluer the color of the variable were the smaller the value. Females, older patients, short patients, patients with a history of motion sickness and/or PONV, light weight Sample size assessment. With the continuously increasing size of the sample data, the AUC values of the testing sets continued to increase, which shows a sufficient sample size was included in this study (Fig. 3).

Discussion
In our research, we developed a total of 54 models for the prediction of PONV in patients with PCA at Sichuan Provincial People's Hospital. The AUC of five best models was 0.9469. Meanwhile, the best model was validated on patients from Chengdu First People's Hospital, and the AUC value also reached 0.8211. Apfel and his colleagues used six models, Apfel, Koivuranta, Sinclair, Palazzo, Gan and Scholz, to study PONV in the European population undergoing orthopedic surgery, gastrointestinal surgery, otorhinolaryngology surgery, and gynecological surgery, and the AUCs of these scoring systems were 0.61 to 0.68 13 . Jorg M. Engel's team used the Koivuranta, simplified Apfel, Sinclair, and Junger risk scoring systems to predict PONV in otolaryngology surgery, and the AUC of the Sinclair and Junger models was 0.70 14 . Wu et al. studied five popular scoring systems, Apfel, Koivuranta, Palazzo and Evans, Simplified Apfel and Simplified Koivuranta risk score systems, which were validated in a Taiwanese population; the AUCs of these scoring systems ranged from 0.62 to 0.67 1 .The AUC of Shim's machine learning models ranged from 0.561 to 0.686 11 . Our models' AUC (0.9469) was significantly higher than that of other models, which may be because our model was constructed using the BSMOTE sampling method, the Lasso screening method, and the CatBoost machine learning method. The different data preprocessing methods and different machine learning algorithms can lead to different prediction performances of the model 12 .
Our study collected almost all of the variables in the perioperative period, probably the largest number of variables in the current prediction model. In prior studies that developed predictive models for PONV, a part of the studies explicitly stated that the trials were conducted without the use of antiemetic drugs, and another part of the studies did not specify whether the patients used antiemetic drugs. All studies did not include antiemetic www.nature.com/scientificreports/ drugs as a predictive model variable, and therefore our study did not include them as a predictive study variable either. A history of surgery, intraoperative urine volume, and blood loss were included as variables in our study. Past models have only presented variables that affect PONV and have not assessed the contribution of each variable to the effect of nausea and vomiting occurrence. In contrast, our model explained the contribution of the different variables on the occurrence of PONV by SHAP. A history of motion sickness and/or PONV, sex, and weight were the three most influential variables among all variables. Weight, history of surgery, intraoperative urine volume and height were used for the first time as factors to predict PONV. However, the reasons for the occurrence of PONV due to these variables are not clear and need to be further investigated.  www.nature.com/scientificreports/ Two variables, history of motion sickness and/or PONV, and sex, also appear in previous predictive models 2,4 and are recognized as important variables influencing PONV. The cause of nausea and vomiting due to a history of motion sickness and/or PONV is not particularly clear and may be related to genetics. Previous studies have suggested that the link between genetics and PONV may be the result of anesthetic agent administration and surgical factors, interacting with various small genomic differences between individuals 5 . The use of NSAIDs instead of opioids for analgesia is known to reduce the occurrence of PONV, and our model supports this finding 15 . Weight, height and BMI were also considered as factors influencing PONV in our model. Patients with lighter body weight, shorter height and smaller BMI were more likely to experience PONV. In Johansson's study, the included patients had a BMI of 28.3 ± 6.9, and patients with BMI > 35 kg/m 2 were more likely to experience www.nature.com/scientificreports/ PONV 16 , whereas the patients in our study had a BMI of 23.51 ± 3.54, and only 20 patients had a BMI > 35 kg/ m 2 . Differences between races may be responsible for the different results. In our prediction model, a higher intraoperative infusion volume and lower intraoperative urine volume led to an increased probability of PONV. During surgery, when a patient is infused with a high volume of fluid while intraoperative urine volume remains low, there may be a reduction in the effective circulating intravascular volume and an inadequate amount of fluid infusion, or there is a condition of renal impairment 17 . Jewer found that adequate perioperative intravenous crystalloid infusion administration reduces PONV in ASA I to II patients receiving general anaesthesia for short surgical procedures 18 . In other words, inadequate infusion may lead to an increased incidence of PONV. This finding is in line with our conclusion.
Our prediction model also showed that older age may lead to a greater risk of PONV in patients. However, the Apfel team's prediction model for PONV in outpatients suggests that patients under the age of 50 are more likely to experience PONV 18 . In our study, segmentation of patient age was not performed, but rather patient age was included in the machine learning model as a continuous variable for the study. The reason for the difference may be related to the sample size, the number of variables, and the differences in ethnicity, so further studies with larger samples are needed. www.nature.com/scientificreports/ Although multiple comprehensive guidelines and risk assessment models have been published on the subject, PONV continues to plague the surgical population. The most likely reason is the lack of compliance with nausea and vomiting prevention guidelines 19 . Pysyk et al. reported that the incidence of PONV was reduced by annual anesthesiologist performance feedback urging the use of antiemetic medications 20 . Rajan et al. concluded that identifying high-risk patients through the use of predictive models for PONV, taking intraoperative and postoperative combinations of multiple types of antiemetic medications, or changing anesthesia can further reduce the incidence of PONV 19 . Therefore, it is possible to reduce the incidence of PONV by using assessment tools to proactively guide clinical practice.
Limitations. This study has some limitations. First, the study population came from only two medical institutions in the same city in western China and did not include patients who underwent head and neck surgery (this group of patients did not undergo routine postoperative intravenous analgesia). The sample may be underrepresentative, and there may be sample selection bias, which may have some impact on the extrapolation of the results of this study. Second, inhaled anesthetics and propofol were used in most patients, and it was not possible to determine the effect of these two drugs separately on PONV.

Conclusions
In conclusion, we not only constructed machine learning models for predicting PONV, but also identified factors affecting the occurrence of PONV. The three indicators of history of motion sickness and/or PONV, female sex, and light weight are especially important for anesthesiologists and surgeons to take consider. We hope our prediction model can serve as a reference for clinical decision-making.

Methods
Data sources. Patients who received surgical procedures in Sichuan Provincial People's Hospital from October 2021 to March 2022 were included in this study and were used for the modeling. To externally validate the predictive model, we retrospectively collected data related to patients who underwent surgical procedures from February to July 2022 in Chengdu First People's Hospital. The inclusion criteria were as follows: patients (aged ≥ 18 years) who underwent general anesthesia and postoperative PCA. Exclusion criteria: patients admitted to the intensive care unit (ICU) after surgery. The Ethics Committee of Sichuan Provincial People's Hospital (approval no. 2022-49-1) and Chengdu First People's Hospital (approval no.2022-HXKT-011) approved this retrospective analysis of routinely collected data and waived patient consent. This study was registered at the Chinese Clinical Trial Registry (Registration number ChiCTR2200056097, principal investigator: Min Xie, http:// www. chictr. org. cn/ showp roj. aspx? proj= 151192, date of registration: February 1, 2022). Our study methods were performed in accordance with the guidelines and regulations of the clinical registry. All private personal information was protected and removed during the process of analysis and publication.

Data collection and outcome definition.
Recent studies have found that some factors, including the type of surgery 14,21 , anesthesia drugs [22][23][24][25] , age 18,26 , perioperative fasting 27 , infusion volume 28 , anxiety 29 , inhalation anesthetics 27 , body mass index (BMI) 16 and operative duration 6 are related to the PONV. Therefore, we included as many variables as possible in our prediction model. Some of these variables were not present in past studies, such as the history of surgery, intraoperative urine volume and blood loss.
The clinical information of patients was retrospectively collected by the Hospital Information System (HIS) and scientific research assistants. The medical history and condition of patients were collected by surgeons and recorded in the HIS. The anesthetic protocol and postoperative analgesia formula were determined by the www.nature.com/scientificreports/ patient's anesthesiologist and were not standardized. The nurses recorded the occurrence of PONV and rescue analgesics in the PACU. When the patient returned to the ward, the anesthesia nurse followed up with PONV and other side effects for 24 to 72 h after the procedure. The anesthesia nurse asked the patients questions about vomiting and nausea, such as, "Have you vomited or had dry-retching?", "Have you experienced a feeling of nausea?" and "When did you experience PONV?". PONV was considered to have occurred when patients had nausea, vomiting, or both. At the same time, the patient's resting pain score and movement pain score were measured by a visual analog scale (VAS). Most PONV occurs within 24 h after surgery and decreases in degree and incidence with time 30 . In this study, only PONV and movement pain scores that occurred within 24 h postoperatively were recorded. All data were collected from the HIS by scientific research assistants, who were blinded to the study hypothesis.
Data partitioning and dataset building. The patients at Sichuan Provincial People's Hospital were divided into a training set and test set at a ratio of 8:2 and were used to train and test models respectively. Patients at Chengdu First People's Hospital were used to detect the developed models externally. Some of the missing clinical information, such as height, weight, history of motion sickness, and/or PONV, was collected by the research assistant on the phone; the other missing data, such as PCA regimen, were filled in using the random forest method.
To minimize the adverse impact of data imbalance on prediction performance, the synthetic minority oversampling technique (SMOTE) and the borderline synthetic minority oversampling technique (BSMOTE) were applied. Three variable selection methods were used: (1) the Boruta screening method which is a feature selection algorithm to identify the minimal set of relevant variables; (2) the Lasso screening method which evaluates the importance of variables and output the results by introducing a penalization parameter penalizing and discarding unimportant variables; and (3) recursive feature elimination(RFE), which selects those features in a training dataset that are more or most relevant in predicting the target variable ( Fig. 4) 31,32 . Model development. In this process, 9 machine learning algorithms were trained for binary classification and applied to develop predictive models, including logistic regression, random forest, stochastic gradient descent (SGD), extreme Gradient Boosting (XGBoost), K-nearest neighbor (KNN), support vector classify (SVC), decision tree, category boosting (CatBoost), multilayer perceptron (MLP) 31,32 . The dataset of Sichuan Provincial People's Hospital was divided into a training set and a test set at a ratio of 8:2; the training set was used to build models, and the test set was used to evaluate the predictive performance of the models. Internal validation was conducted with tenfold cross-validation in the training set (Fig. 4) 31 .
Model evaluation. We used the AUC, accuracy, precision, recall rate, F1 value and area under the precision-recall curve (AUPRC) to evaluate the predictive performance of the model 31 . The AUCs of different models were compared, and the model with the largest AUC was selected to develop a PONV prediction system of PCA. SHAP helped to explain the contribution of variables to the model 31 . We applied the best model to patients in Chengdu First People's Hospital and used the same quantitative metrics to evaluate the performance of the model (Fig. 4).

Sample size validation.
To estimate the impact of sample sizes on predictive performance, 10% of the samples were randomly extracted from the training set to train the model, and the AUC was evaluated in the test set. The training samples increased from 10 to 100% in increments of 10%. The above process was repeated 100 times, and the results were plotted on a line graph 31 . The contribution of a sample size to improve the prediction performance of models was assessed according to the inflection point change on the line graph.
Statistical analysis. Continuous variables were described by mean and standard deviation, whereas categorical variables were expressed in terms of frequencies and percentages. Analysis of variance (ANOVA) and rank sum test were used for univariate analysis. Hypothesis testing and model building were implemented using the stats and sklearn packages in Python (V.3.8) 31  www.nature.com/scientificreports/

Data availability
All data generated or analyzed during this study are included in the paper and in the Supplementary Materials (Table S1). Raw data on this study are available from the corresponding author on reasonable request.  www.nature.com/scientificreports/